Search CORE

20 research outputs found

CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model

Author: Bai Xiang
Liang Dingkang
Xie Jiahao
Xu Wei
Ye Xiaoqing
Zou Zhikang
Publication venue
Publication date: 09/04/2023
Field of study

Supervised crowd counting relies heavily on costly manual labeling, which is difficult and expensive, especially in dense scenes. To alleviate the problem, we propose a novel unsupervised framework for crowd counting, named CrowdCLIP. The core idea is built on two observations: 1) the recent contrastive pre-trained vision-language model (CLIP) has presented impressive performance on various downstream tasks; 2) there is a natural mapping between crowd patches and count text. To the best of our knowledge, CrowdCLIP is the first to investigate the vision language knowledge to solve the counting problem. Specifically, in the training stage, we exploit the multi-modal ranking loss by constructing ranking text prompts to match the size-sorted crowd patches to guide the image encoder learning. In the testing stage, to deal with the diversity of image patches, we propose a simple yet effective progressive filtering strategy to first select the highly potential crowd patches and then map them into the language space with various counting intervals. Extensive experiments on five challenging datasets demonstrate that the proposed CrowdCLIP achieves superior performance compared to previous unsupervised state-of-the-art counting methods. Notably, CrowdCLIP even surpasses some popular fully-supervised methods under the cross-dataset setting. The source code will be available at https://github.com/dk-liang/CrowdCLIP.Comment: Accepted by CVPR 202

arXiv.org e-Print Archive

SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model

Author: Bai Xiang
Liang Dingkang
Liu Zhe
Yang Hongcheng
Ye Xiaoqing
Zhang Dingyuan
Zou Zhikang
Publication venue
Publication date: 03/06/2023
Field of study

With the development of large language models, many remarkable linguistic systems like ChatGPT have thrived and achieved astonishing success on many tasks, showing the incredible power of foundation models. In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks. However, whether SAM can be adapted to 3D vision tasks has yet to be explored, especially 3D object detection. With this inspiration, we explore adapting the zero-shot ability of SAM to 3D object detection in this paper. We propose a SAM-powered BEV processing pipeline to detect objects and get promising results on the large-scale Waymo open dataset. As an early attempt, our method takes a step toward 3D object detection with vision foundation models and presents the opportunity to unleash their power on 3D vision tasks. The code is released at https://github.com/DYZhang09/SAM3D.Comment: Technical Report. The code is released at https://github.com/DYZhang09/SAM3

arXiv.org e-Print Archive

Paint and Distill: Boosting 3D Object Detection with Semantic Passing Network

Author: Ding Errui
Jiang Minyue
Ju Bo
Tan Xiao
Wang Jingdong
Ye Xiaoqing
Zou Zhikang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/07/2022
Field of study

3D object detection task from lidar or camera sensors is essential for autonomous driving. Pioneer attempts at multi-modality fusion complement the sparse lidar point clouds with rich semantic texture information from images at the cost of extra network designs and overhead. In this work, we propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models with the guidance of rich context painting, with no extra computation cost during inference. Our key design is to first exploit the potential instructive semantic knowledge within the ground-truth labels by training a semantic-painted teacher model and then guide the pure-lidar network to learn the semantic-painted representation via knowledge passing modules at different granularities: class-wise passing, pixel-wise passing and instance-wise passing. Experimental results show that the proposed SPNet can seamlessly cooperate with most existing 3D detection frameworks with 1~5% AP gain and even achieve new state-of-the-art 3D detection performance on the KITTI test benchmark. Code is available at: https://github.com/jb892/SPNet.Comment: Accepted by ACMMM202

arXiv.org e-Print Archive

SOOD: Towards Semi-Supervised Oriented Object Detection

Author: Bai Xiang
Hua Wei
Li Jingyu
Liang Dingkang
Liu Xiaolong
Ye Xiaoqing
Zou Zhikang
Publication venue
Publication date: 10/04/2023
Field of study

Semi-Supervised Object Detection (SSOD), aiming to explore unlabeled data for boosting object detectors, has become an active task in recent years. However, existing SSOD approaches mainly focus on horizontal objects, leaving multi-oriented objects that are common in aerial images unexplored. This paper proposes a novel Semi-supervised Oriented Object Detection model, termed SOOD, built upon the mainstream pseudo-labeling framework. Towards oriented objects in aerial scenes, we design two loss functions to provide better supervision. Focusing on the orientations of objects, the first loss regularizes the consistency between each pseudo-label-prediction pair (includes a prediction and its corresponding pseudo label) with adaptive weights based on their orientation gap. Focusing on the layout of an image, the second loss regularizes the similarity and explicitly builds the many-to-many relation between the sets of pseudo-labels and predictions. Such a global consistency constraint can further boost semi-supervised learning. Our experiments show that when trained with the two proposed losses, SOOD surpasses the state-of-the-art SSOD methods under various settings on the DOTA-v1.5 benchmark. The code will be available at https://github.com/HamPerdredes/SOOD.Comment: Accepted to CVPR 2023. Code will be available at https://github.com/HamPerdredes/SOO

arXiv.org e-Print Archive

SGM3D: Stereo Guided Monocular 3D Object Detection

Author: Du Liang
Feng Jianfeng
Tan Xiao
Xue Xiangyang
Ye Xiaoqing
Zhang Li
Zhou Zheyuan
Zou Zhikang
Publication venue
Publication date: 24/02/2022
Field of study

Monocular 3D object detection aims to predict the object location, dimension and orientation in 3D space alongside the object category given only a monocular image. It poses a great challenge due to its ill-posed property which is critically lack of depth information in the 2D image plane. While there exist approaches leveraging off-the-shelve depth estimation or relying on LiDAR sensors to mitigate this problem, the dependence on the additional depth model or expensive equipment severely limits their scalability to generic 3D perception. In this paper, we propose a stereo-guided monocular 3D object detection framework, dubbed SGM3D, adapting the robust 3D features learned from stereo inputs to enhance the feature for monocular detection. We innovatively present a multi-granularity domain adaptation (MG-DA) mechanism to exploit the network's ability to generate stereo-mimicking features given only on monocular cues. Coarse BEV feature-level, as well as the fine anchor-level domain adaptation, are both leveraged for guidance in the monocular domain.In addition, we introduce an IoU matching-based alignment (IoU-MA) method for object-level domain adaptation between the stereo and monocular predictions to alleviate the mismatches while adopting the MG-DA. Extensive experiments demonstrate state-of-the-art results on KITTI and Lyft datasets.Comment: 8 pages, 5 figure

arXiv.org e-Print Archive

Diffusion-based 3D Object Detection with Random Boxes

Author: Bai Xiang
Cheng Jianwei
Hou Jinghua
Liang Dingkang
Liu Zhe
Yao Tingting
Ye Xiaoqing
Zhou Xin
Zou Zhikang
Publication venue
Publication date: 05/09/2023
Field of study

3D object detection is an essential task for achieving autonomous driving. Existing anchor-based detection methods rely on empirical heuristics setting of anchors, which makes the algorithms lack elegance. In recent years, we have witnessed the rise of several generative models, among which diffusion models show great potential for learning the transformation of two distributions. Our proposed Diff3Det migrates the diffusion model to proposal generation for 3D object detection by considering the detection boxes as generative targets. During training, the object boxes diffuse from the ground truth boxes to the Gaussian distribution, and the decoder learns to reverse this noise process. In the inference stage, the model progressively refines a set of random boxes to the prediction results. We provide detailed experiments on the KITTI benchmark and achieve promising performance compared to classical anchor-based 3D detection methods.Comment: Accepted by PRCV 202

arXiv.org e-Print Archive

Microstructure and mechanical properties of refill friction stir spot welded joints: Effects of tool size and welding parameters

Author: Qiang Chu
Wenya Li
Xiawei Yang
Yangfan Zou
Yu Su
Zhikang Shen
Publication venue: 'Elsevier BV'
Publication date: 01/11/2022
Field of study

A novel refill friction stir spot welding (RFSSW) technique employing large-sized tools is proposed. The microstructure and mechanical properties of joints produced with a large-sized welding tool and a conventional tool are compared. The results show that the exit line resulting from the sleeve becomes longer with increasing the plunge depth, and the diameter of nugget increases with higher rotational speed for the joints produced by both conventional and novel tools. The plunge depth increases from 2.0 mm to 2.2 mm, then to 2.4 mm, which affects the hook defect to bend upwards, almost parallel to the lap interface, to bend downwards, respectively. The joints produced with the novel tool have a flat hook compared to the conventional tool. The microstructure evolution of the conventional and novel joints is similar. The tensile-shear and tearing forces measured of the novel joints are higher than those of the conventional joints for the same welding parameters. For conventional joints, the maximum tensile-shear and tearing forces are 8.6 ± 0.1 kN and 4.4 ± 0.2 kN, respectively. The maximum tensile-shear and tearing forces for novel joints are 10.9 ± 0.1 kN and 5.6 ± 0.1 kN, respectively. After the tensile-shear test, there are three modes of fracture, the upper-mixed, the lower-mixed, and the shear fracture one. The plunge depth has a pronounced effect on the fracture mode of joints

Directory of Open Access Journals